Goto

Collaborating Authors

 joint attention


TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models

arXiv.org Artificial Intelligence

Vision-language-action models (VLAs) trained on large-scale robotic datasets have demonstrated strong performance on manipulation tasks, including bimanual tasks. However, because most public datasets focus on single-arm demonstrations, adapting VLAs for bimanual tasks typically requires substantial additional bimanual data and fine-tuning. To address this challenge, we introduce TwinVLA, a modular framework that composes two copies of a pretrained single-arm VLA into a coordinated bimanual VLA. Unlike monolithic cross-embodiment models trained on mixtures of single-arm and bimanual data, TwinVLA improves both data efficiency and performance by composing pretrained single-arm policies. Across diverse bimanual tasks in real-world and simulation settings, TwinVLA outperforms a comparably-sized monolithic RDT-1B model without requiring any bimanual pretraining. Furthermore, it narrows the gap to state-of-the-art model, $ฯ€_0$ which rely on extensive proprietary bimanual data and compute cost. These results establish our modular composition approach as a data-efficient and scalable path toward high-performance bimanual manipulation, leveraging public single-arm data.


Teamwork: Collaborative Diffusion with Low-rank Coordination and Adaptation

arXiv.org Artificial Intelligence

Large pretrained diffusion models can provide strong priors beneficial for many graphics applications. However, generative applications such as neural rendering and inverse methods such as SVBRDF estimation and intrinsic image decomposition require additional input or output channels. Current solutions for channel expansion are often application specific and these solutions can be difficult to adapt to different diffusion models or new tasks. This paper introduces Teamwork: a flexible and efficient unified solution for jointly increasing the number of input and output channels as well as adapting a pretrained diffusion model to new tasks. Teamwork achieves channel expansion without altering the pretrained diffusion model architecture by coordinating and adapting multiple instances of the base diffusion model (\ie, teammates). We employ a novel variation of Low Rank-Adaptation (LoRA) to jointly address both adaptation and coordination between the different teammates. Furthermore Teamwork supports dynamic (de)activation of teammates. We demonstrate the flexibility and efficiency of Teamwork on a variety of generative and inverse graphics tasks such as inpainting, single image SVBRDF estimation, intrinsic decomposition, neural shading, and intrinsic image synthesis.


Gaze Behavior During a Long-Term, In-Home, Social Robot Intervention for Children with ASD

arXiv.org Artificial Intelligence

Atypical gaze behavior is a diagnostic hallmark of Autism Spectrum Disorder (ASD), playing a substantial role in the social and communicative challenges that individuals with ASD face. This study explores the impacts of a month-long, in-home intervention designed to promote triadic interactions between a social robot, a child with ASD, and their caregiver. Our results indicate that the intervention successfully promoted appropriate gaze behavior, encouraging children with ASD to follow the robot's gaze, resulting in more frequent and prolonged instances of spontaneous eye contact and joint attention with their caregivers. Additionally, we observed specific timelines for behavioral variability and novelty effects among users. Furthermore, diagnostic measures for ASD emerged as strong predictors of gaze patterns for both caregivers and children. These results deepen our understanding of ASD gaze patterns and highlight the potential for clinical relevance of robot-assisted interventions.


Advancing Robot-Assisted Autism Therapy: A Novel Algorithm for Enhancing Joint Attention Interventions

arXiv.org Artificial Intelligence

Recent studies have revealed that using social robots can accelerate the learning process of several skills in areas where autistic children typically show deficits. However, most early research studies conducted interactions via free play. More recent research has demonstrated that robot-mediated autism therapies focusing on core impairments of autism spectrum disorder (e.g., joint attention) yield better results than unstructured interactions. This paper aims to systematically review the most relevant findings concerning the application of social robotics to joint attention tasks, a cardinal feature of autism spectrum disorder that significantly influences the neurodevelopmental trajectory of autistic children. Initially, we define autism spectrum disorder and explore its societal implications. Following this, we examine the need for technological aid and the potentialities of robot-assisted autism therapy. We then define joint attention and highlight its crucial role in children's social and cognitive development. Subsequently, we analyze the importance of structured interactions and the role of selecting the optimal robot for specific tasks. This is followed by a comparative analysis of the works reviewed earlier, presenting an in-depth examination of two distinct formal models employed to design the prompts and reward system that enables the robot to adapt to children's responses. These models are critically compared to highlight their strengths and limitations. Next, we introduce a novel algorithm to address the identified limitations, integrating interactive environmental factors and a more sophisticated prompting and reward system. Finally, we propose further research directions, discuss the most relevant open questions, and draw conclusions regarding the effectiveness of social robotics in the medical treatment of autism spectrum disorders.


The SocialAI School: Insights from Developmental Psychology Towards Artificial Socio-Cultural Agents

arXiv.org Artificial Intelligence

Developmental psychologists have long-established the importance of socio-cognitive abilities in human intelligence. These abilities enable us to enter, participate and benefit from human culture. AI research on social interactive agents mostly concerns the emergence of culture in a multi-agent setting (often without a strong grounding in developmental psychology). We argue that AI research should be informed by psychology and study socio-cognitive abilities enabling to enter a culture too. We discuss the theories of Michael Tomasello and Jerome Bruner to introduce some of their concepts to AI and outline key concepts and socio-cognitive abilities. We present The SocialAI school - a tool including a customizable parameterized uite of procedurally generated environments, which simplifies conducting experiments regarding those concepts. We show examples of such experiments with RL agents and Large Language Models. The main motivation of this work is to engage the AI community around the problem of social intelligence informed by developmental psychology, and to provide a tool to simplify first steps in this direction. Refer to the project website for code and additional information: https://sites.google.com/view/socialai-school.


Knowing Where to Look: A Planning-based Architecture to Automate the Gaze Behavior of Social Robots

arXiv.org Artificial Intelligence

Gaze cues play an important role in human communication and are used to coordinate turn-taking and joint attention, as well as to regulate intimacy. In order to have fluent conversations with people, social robots need to exhibit human-like gaze behavior. Previous Gaze Control Systems (GCS) in HRI have automated robot gaze using data-driven or heuristic approaches. However, these systems tend to be mainly reactive in nature. Planning the robot gaze ahead of time could help in achieving more realistic gaze behavior and better eye-head coordination. In this paper, we propose and implement a novel planning-based GCS. We evaluate our system in a comparative within-subjects user study (N=26) between a reactive system and our proposed system. The results show that the users preferred the proposed system and that it was significantly more interpretable and better at regulating intimacy.


Joint Attention for Multi-Agent Coordination and Social Learning

arXiv.org Artificial Intelligence

Joint attention - the ability to purposefully coordinate attention with another agent, and mutually attend to the same thing -- is a critical component of human social cognition. In this paper, we ask whether joint attention can be useful as a mechanism for improving multi-agent coordination and social learning. We first develop deep reinforcement learning (RL) agents with a recurrent visual attention architecture. We then train agents to minimize the difference between the attention weights that they apply to the environment at each timestep, and the attention of other agents. Our results show that this joint attention incentive improves agents' ability to solve difficult coordination tasks, by reducing the exponential cost of exploring the joint multi-agent action space. Joint attention leads to higher performance than a competitive centralized critic baseline across multiple environments. Further, we show that joint attention enhances agents' ability to learn from experts present in their environment, even when completing hard exploration tasks that do not require coordination. Taken together, these findings suggest that joint attention may be a useful inductive bias for multi-agent learning.


Electroencephalogram (EEG) for Delineating Objective Measure of Autism Spectrum Disorder (ASD) (Extended Version)

arXiv.org Machine Learning

Autism Spectrum Disorder (ASD) is a developmental disorder that often impairs a child's normal development of the brain. According to CDC, it is estimated that 1 in 6 children in the US suffer from development disorders, and 1 in 68 children in the US suffer from ASD. This condition has a negative impact on a person's ability to hear, socialize and communicate. Overall, ASD has a broad range of symptoms and severity; hence the term spectrum is used. One of the main contributors to ASD is known to be genetics. Up to date, no suitable cure for ASD has been found. Early diagnosis is crucial for the long-term treatment of ASD, but this is challenging due to the lack of a proper objective measures. Subjective measures often take more time, resources, and have false positives or false negatives. There is a need for efficient objective measures that can help in diagnosing this disease early as possible with less effort. EEG measures the electric signals of the brain via electrodes placed on various places on the scalp. These signals can be used to study complex neuropsychiatric issues. Studies have shown that EEG has the potential to be used as a biomarker for various neurological conditions including ASD. This chapter will outline the usage of EEG measurement for the classification of ASD using machine learning algorithms.


How Will We Keep People, Not Machines, At The Center Of The Future Of Work?

#artificialintelligence

Get this: 30% of American jobs will be automated by 2030, according to an estimate from McKinsey Global Institute. The inevitable march of technological progress ensures that many of us will be looking for work in the near future. Machines are simply better than us at a wide and growing set of tasks. We can't compete with robots at robot tasks. So why do we insist on trying?


Discovery and usage of joint attention in images

arXiv.org Artificial Intelligence

Joint visual attention is characterized by two or more individuals looking at a common target at the same time. The ability to identify joint attention in scenes, the people involved, and their common target, is fundamental to the understanding of social interactions, including others' intentions and goals. In this work we deal with the extraction of joint attention events, and the use of such events for image descriptions. The work makes two novel contributions. First, our extraction algorithm is the first which identifies joint visual attention in single static images. It computes 3-D gaze direction, identifies the gaze-target by combining gaze-direction with a 3-D depth map computed for the image, and identifies the common gaze target. Second, we use a human study to demonstrate the sensitivity of humans to joint attention, suggesting that the detection of such a configuration in an image can be useful for understanding the image, including the goals of the agents and their joint activity, and therefore can contribute to image captioning and related tasks.